Interactive Learning from Policy-Dependent Human Feedback

نویسندگان

James MacGlashan

Mark K. Ho

Robert Tyler Loftin

Bei Peng

Guan Wang

David L. Roberts

Matthew E. Taylor

Michael L. Littman

چکیده

For agents and robots to become more useful, they must be able to quickly learn from non-technical users. This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner’s current policy. We present empirical results that show this assumption to be false—whether human trainers give a positive or negative feedback for a decision is influenced by the learner’s current policy. We argue that policy-dependent feedback, in addition to being commonplace, enables useful training strategies from which agents should benefit. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot, even with noisy image features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shapin...

متن کامل

Policy Shaping from Simulated Critique in Domains with Multiple Optimal Policies

In many domains, there exist multiple ways for an agent to achieve optimal performance. Feedback may be provided along one or more of them to aid learning. In this work, we evaluate the interactive reinforcement learning algorithm Policy Shaping in domains with multiple optimal policies. We codify different feedback strategies as automated oracles and analyze their effect on the agent’s learnin...

متن کامل

The Impact of Task-supported Interactive Feedback on the Accuracy, Fluency, and Organization of Iranian EFL Learners’ Writing

Controversy has not been yet resolved among researchers in second language research over the pedagogical efficacy of feedback in enhancing various features of learners’ writing skill. Research findings highlighting the significance of interactive tasks and learners’ engagement in improving the learning process stimulated the present study, the purpose of which was to explore the effect of task-...

متن کامل

Interactive reinforcement learning for task-oriented dialogue management

Dialogue management is the component of a dialogue system that determines the optimal action for the system to take at each turn. An important consideration for dialogue managers is the ability to adapt to new user behaviors unseen during training. In this paper, we investigate policy gradient based methods for interactive reinforcement learning where the agent receives action-specific feedback...

متن کامل

Supervised autonomy for online learning in human-robot interaction

When a robot is learning it needs to explore its environment and how its environment responds on its actions. When the environment is large and there are a large number of possible actions the robot can take, this exploration phase can take prohibitively long. However, exploration can often be optimised by letting a human expert guide the robot during its learning. Interactive machine learning,...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Interactive Learning from Policy-Dependent Human Feedback

نویسندگان

چکیده

منابع مشابه

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Policy Shaping from Simulated Critique in Domains with Multiple Optimal Policies

The Impact of Task-supported Interactive Feedback on the Accuracy, Fluency, and Organization of Iranian EFL Learners’ Writing

Interactive reinforcement learning for task-oriented dialogue management

Supervised autonomy for online learning in human-robot interaction

عنوان ژورنال:

اشتراک گذاری